Joining Statistics with NLP for Text Categorization
نویسنده
چکیده
Automatic news categorization systems have produced high accuracy, consistency, and flexibility using some natural language processing techniques. These knowledge-based categorization methods are more powerful and accurate than statistical techniques. However, the phrasal pre-processing and pattern matching methods that seem to work for categorization have the disadvantage of requiring a fair amount of knowledge-encoding by human beings. In addition, they work much better at certain tasks, such as identifying major events in texts, than at others, such as determining what sort of business or product is involved in a news event. Statistical methods for categorization, on the other hand, are easy to implement and require little or no human customization. But they don't offer any of the benefits of natural language processing, such as the ability to identify relationships and enforce linguistic constraints. Our approach has been to use statistics in the knowledge acquisition component of a linguistic pattern-based categorization system, using statistical methods, for example, to associate words with industries and identify phrases that information about businesses or products. Instead of replacing knowledge-based methods with statistics, statistical training replaces knowledge engineering. This has resulted in high accuracy, shorter customization time, and good prospects for the application of the statistical methods to problems in lexical acquisition.
منابع مشابه
A Comprehensive NLP System for Modern Standard Arabic and Modern Hebrew
This paper presents a comprehensive NLP system by Melingo that has been recently developed for Arabic, based on Morfix an operational formerly developed highly successful comprehensive Hebrew NLP system. The system discussed includes modules for morphological analysis, context sensitive lemmatization, vocalization, text-to-phoneme conversion, and syntactic-analysis-based prosody (intonation) ...
متن کاملAn Efficient Coupled Genetic Algorithm-NLP Method for Heat Exchanger Network Synthesis
Synthesis of heat exchanger networks (HENs) is inherently a mixed integer and nonlinear programming (MINLP) problem. Solving such problems leads to difficulties <span style="font-size: 10pt; color: #00...
متن کاملLarge Margin Winnow Methods for Text Categorization
ABSTRACT The SNoW (Sparse Network of Winnows) ar hite ture has re ently been su essful applied to a number of natural language pro essing (NLP) problems. In this paper, we propose large margin versions of the Winnow algorithms, whi h we argue an potentially enhan e the performan e of basi Winnows (and hen e the SNoW ar hite ture). We demonstrate that the resulting methods a hieve performan e om...
متن کاملBioTxtM 2016 Fifth Workshop on Building and Evaluating Resources for Biomedical Text Mining
Methods based on deep learning approaches have recently achieved state-of-the-art performance in a range of machine learning tasks and are increasingly applied to natural language processing (NLP). Despite strong results in various established NLP tasks involving general domain texts, there is only limited work applying these models to biomedical NLP. In this paper, we consider a Convolutional ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1992